fix(ai/recipes): declare max_batch_tokens on google embedding recipe by ChenyqThu · Pull Request #1016 · garrytan/gbrain

ChenyqThu · 2026-05-15T08:01:04Z

Why

google is the only first-party embedding recipe still missing max_batch_tokens after v0.32 (#779) landed the once-per-process startup warning. Operators routing through google:gemini-embedding-001 (the default-provider path after v0.27 native gateway) see the warning on every gbrain query, every kos-compat-api /ingest response, and every cron gbrain invocation:

[ai.gateway] recipe "google" declares an embedding touchpoint without max_batch_tokens; recursion is the only safety net for batch caps.

For CJK-dense or large-payload batches the absent field also forces the gateway to discover Google's per-request token cap reactively via recursive halving instead of pre-splitting — one wasted HTTP round-trip per oversized batch.

The existing test test/ai/no-batch-cap-suppression.serial.test.ts even explicitly pins this state with the comment "google should warn (it has fixed-cap models)", framing it as a known TODO.

Patch

Three field additions to google.touchpoints.embedding:

max_batch_tokens: 20_000,
chars_per_token: 2,
// safety_factor left at gateway default 0.8

Rationale

max_batch_tokens: 20_000 — Google's documented per-text cap is 2 048 tokens; ~20 k tokens per request is the soft cap before gemini-embedding-001 starts emitting 429s.
chars_per_token: 2 — English averages ~4 chars/token (OpenAI default), CJK ~1.5. Picking 2 reflects realistic density on mixed-corpora brains so pre-split stays safe on CJK-heavy payloads without over-shrinking on English.
safety_factor — left at gateway default 0.8. Pre-split lands at 20 000 × 0.8 / 2 = 8 000 chars/batch.

Test follow-through

Two regression tests pinned google as the canary "real provider with no cap declared":

test/ai/no-batch-cap-suppression.serial.test.ts: assumed google STILL warned. With google capped, the test flips to assert the stronger invariant — no first-party recipe warns, because every native/openai-compat recipe now declares either max_batch_tokens or no_batch_cap.
test/ai/adaptive-embed-batch.test.ts: checked contractMatch.length > 0. After this patch the canary set is empty → toBe(0). The once-per-process suppression mechanism is still exercised by the firstCallCount stability check earlier in the same test.

Both tests still gate the suppression machinery; they just no longer require any first-party recipe to be the canary. If maintainers prefer a synthetic test-only uncapped recipe to keep a positive "warning fires when expected" assertion, happy to add one in a follow-up.

Validation

bun run typecheck clean
bun test test/ai/ — 144 pass / 0 fail (was 142 pass / 2 fail pre-patch, expected: the two tests above)

Field repro

Filed downstream from the Jarvis KOS v2 fork (ChenyqThu/jarvis-knowledge-os-v2), where the google native gateway is the default embedding provider since the v0.27 M3 cutover. The warning has been load-bearing log noise for ~6 weeks across kos-compat-api responses, cron logs, and operator queries. Same pattern as #627 / upstream fixwave #682+#741 (forward-bootstrap mcp_request_log) — small fork-local patch carried while upstream evaluates.

🤖 Generated with Claude Code

google was the only first-party embedding recipe still missing max_batch_tokens after v0.32 garrytan#779 landed the once-per-process startup warning. Operators routing through google:gemini-embedding-001 (the default-provider path after v0.27 native gateway) saw the warning on every `gbrain query`, every kos-compat-api / MCP `/ingest` response, and every cron `gbrain` invocation. For CJK-dense or large-payload batches the absent field also forced the gateway to discover Google's per-request token cap reactively via recursive halving instead of pre-splitting. Declared: - max_batch_tokens: 20_000 — Google's per-text cap is 2048 tokens; ~20k tokens/request is the soft cap before gemini-embedding-001 starts emitting 429s. - chars_per_token: 2 — CJK density on mixed corpora (English averages ~4, CJK ~1.5; 2 keeps pre-split safe for both). - safety_factor left at gateway default 0.8 → pre-split lands at ~8 000 chars/batch, well under any per-request floor Google publishes. Two existing regression tests pinned google as the canary "real provider with no cap declared": - test/ai/no-batch-cap-suppression.serial.test.ts assumed google STILL warned (the comment explicitly called it a fixed-cap model waiting for someone to cap it). With this patch google joins the capped set, so the test flips to assert the strong invariant: NO first-party recipe warns, because every native and openai-compat recipe now declares either max_batch_tokens or no_batch_cap. - test/ai/adaptive-embed-batch.test.ts checked `contractMatch.length > 0`. After this patch the canary set is empty, so `toBe(0)`. The once-per-process suppression mechanism is still exercised by the `firstCallCount` stability check earlier in the same test. Validation: - bun run typecheck clean - bun test test/ai/ — 144 pass / 0 fail (was 142 pass / 2 fail pre-patch, expected: the two tests above)

PR garrytan#1016 (upstream-fix/google-recipe-max-batch-tokens) also touches two regression tests that pinned google as the canary "real provider with no cap declared". Without backporting them to fork master, `bun test test/ai/` would 2-fail on this tree even though google.ts has been capped here since af2a806. - no-batch-cap-suppression.serial.test.ts: replaces the "STILL warns for google" test with a stronger invariant — NO first-party recipe warns, because every native/openai-compat recipe now declares either max_batch_tokens or no_batch_cap. - adaptive-embed-batch.test.ts: inverts `contractMatch.length > 0` to `=== 0` and adds google to the "must be absent from warnings" assertion list (sibling to voyage + openai). Validation: `bun test test/ai/` → 144 pass / 0 fail. Also: TODO.md updates link PR-1016 + PR-1017 from the related entries; PR-2 oauth-bootstrap entry flips to FILED with the same patch metadata. On upstream merge of either PR, the next sync auto-drops these local edits via clean text merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

JARVIS-ARCHITECTURE.md §6.25 records the 2026-05-15 follow-up session: diagnostic correction (the historical 117k stderr "fire" that wasn't), two upstream PRs filed (garrytan#1016 google max_batch_tokens, garrytan#1017 oauth_clients forward-bootstrap), CJK keyword-only eval verdict, overlap-matrix verdict, M2-B + M2-C verdicts, mechanical cleanup (48 sync_failures ack'd, bun-test hang root-caused), and the new [facts:absorb] sub-process DB-connection latent bug. 8 TODO items closed, 1 new entry filed. CLAUDE.md "Chinese-first knowledge base" rule tightened per the CJK-eval verdict: the strict invariant is *compound CJK (4+ Han chars without whitespace) requires vector*; English and 2-3 char standalone CJK match fine on keyword. Operational guidance unchanged — the modal operator query on this brain is a compound CJK phrase that depends on vector being live. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

v0.37.0.0 upstream sync (11 versions, 12 commits, v0.35.6.0 → v0.37.0.0). Conflicts (4) resolved: - CLAUDE.md: keep fork (upstream content lives in docs/CLAUDE-UPSTREAM.md; selected new sections will be ported in Phase 7). - llms-full.txt: take upstream then bun run build:llms in Phase 2. - test/ai/adaptive-embed-batch.test.ts: take upstream — v0.36.1.1 garrytan#1083 cherry-pick superseded our PR garrytan#1016 backport tests. - test/ai/no-batch-cap-suppression.serial.test.ts: take upstream — same. Auto-merged cleanly: VERSION (→ 0.37.0.0), CHANGELOG.md, README.md, package.json, bun.lock, .gitignore, skills/RESOLVER.md (fork ## KOS-Jarvis extensions section preserved at L128, upstream skillpack-harvest entry added at L62), skills/manifest.json (47 total, fork 4 KOS-Jarvis entries preserved), src/core/pglite-engine.ts (WAL fork patch survived at L207 — line shift from L200 due to upstream import additions). Fork-protected paths (skills/kos-jarvis/, server/, workers/, scripts/launchd/) untouched — verified via diff master..HEAD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 1 conflict resolution took upstream versions of these two tests assuming v0.36.1.1 garrytan#1083 cherry-pick was the equivalent of fork PR garrytan#1016. Wrong: v0.36.1.1 went with the "warn only for the configured embedding provider" filter path (gateway.ts), while fork kept the "declare max_batch_tokens on google recipe" path (google.ts +7 lines, retained through merge). Net effect of both paths together: - gateway.ts warning filter narrows to configured provider (upstream) - google.ts has max_batch_tokens declared so won't warn anyway (fork) Fork tests expect 0 warnings; upstream tests expect 1 (google when google configured). Took fork view back — google.ts business value (pre-split CJK batches via max_batch_tokens=20_000 + chars_per_token=2, reduces reactive token-cap discovery overhead) justifies retaining the fork patch. Verification: bun test test/ai/ → 224 pass / 0 fail / 764 expect(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

JARVIS-ARCHITECTURE.md §6.29: - 11 versions overview (v0.35.6.0 → v0.37.0.0, 12 commits, 333 files) - 4-conflict resolution log (test/ai/ reverted to fork after finding v0.36.1.1 garrytan#1083 went a different fix path from PR garrytan#1016) - Schema v66 → v78 migration evidence (12 migrations auto-bootstrap) - ZeroEntropy lock procedure (config set ze_switch_declined_at + ze_switch_prompt_shown=true since CLI has no --decline flag) - Health score 70 → 80 evidence + new v0.36.x checks all PASS - Smoke evidence (typecheck/test/curl/query Chinese+English/patrol) CLAUDE.md: Latest sync story pointer updated to §6.29. TODO.md: - Top intro replaced with v0.37.0.0 sync context - L824 PR garrytan#1016 verdict revised — NOT superseded by v0.36.1.1 - Added 5 new P1 entries (post-v0.37.0.0): - dream-cycle LLM spend observation (calibration phases) - embedding_columns registry declare (optional) - image ingestion roadmap decision - doctor --remediate vs kos-patrol evaluation pointer - typed-claim fence for OH transcripts - Branch cleanup deferred to Lucien (worktree refs block delete) llms-full.txt regenerated via bun run build:llms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…12 commits, v0.35.6.0 → v0.37.0.0) 333 files / +52455 / -3317 LoC. 4 conflicts resolved (CLAUDE.md / llms-full.txt / 2 test/ai/ files — last 2 reverted to fork view after discovering v0.36.1.1 garrytan#1083 went a different fix path from PR garrytan#1016). Schema v66 → v78 auto-bootstrap (12 migrations, zero manual ALTER). WAL fork patch survived at src/core/pglite-engine.ts:207. ZeroEntropy switch lock applied (ze_switch_declined_at + ze_switch_prompt_shown). Doctor health_score 70 → 80; 3140 pages preserved; vector(1536) unchanged. Smoke: bun test test/ai/ 224/224 green. kos.chenge.ink/health returns 0.37.0.0 + engine=postgres. kos-patrol kickstart exit 0, dashboard written. Chinese + English query smoke top-hits in healthy band. Fork dirs unchanged (10). Branch cleanup deferred (worktree refs). PR garrytan#1016 verdict revised: NOT superseded — upstream filter path co-exists with fork declare-max_batch_tokens path. Full story: docs/JARVIS-ARCHITECTURE.md §6.29. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ChenyqThu mentioned this pull request May 17, 2026

feat(dream): --archive-dir flag persists CycleReport JSON + atomic latest.json symlink #1133

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016

fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016
ChenyqThu wants to merge 1 commit into
garrytan:masterfrom
ChenyqThu:upstream-fix/google-recipe-max-batch-tokens

ChenyqThu commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChenyqThu commented May 15, 2026

Why

Patch

Rationale

Test follow-through

Validation

Field repro

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant